Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features

DNA-binding proteins (DBPs) play a significant role in all phases of genetic processes, including DNA recombination, repair, and modification. They are often utilized in drug discovery as fundamental elements of steroids, antibiotics, and anticancer drugs. Predicting them poses the most challenging task in proteomics research. Conventional experimental methods for DBP identification are costly and sometimes biased toward prediction. Therefore, developing powerful computational methods that can accurately and rapidly identify DBPs from sequence information is an urgent need. In this study, we propose a novel deep learning-based method called Deep-WET to accurately identify DBPs from primary sequence information. In Deep-WET, we employed three powerful feature encoding schemes containing Global Vectors, Word2Vec, and fastText to encode the protein sequence. Subsequently, these three features were sequentially combined and weighted using the weights obtained from the elements learned through the differential evolution (DE) algorithm. To enhance the predictive performance of Deep-WET, we applied the SHapley Additive exPlanations approach to remove irrelevant features. Finally, the optimal feature subset was input into convolutional neural networks to construct the Deep-WET predictor. Both cross-validation and independent tests indicated that Deep-WET achieved superior predictive performance compared to conventional machine learning classifiers. In addition, in extensive independent test, Deep-WET was effective and outperformed than several state-of-the-art methods for DBP prediction, with accuracy of 78.08%, MCC of 0.559, and AUC of 0.805. This superior performance shows that Deep-WET has a tremendous predictive capacity to predict DBPs. The web server of Deep-WET and curated datasets in this study are available at https://deepwet-dna.monarcatechnical.com/. The proposed Deep-WET is anticipated to serve the community-wide effort for large-scale identification of potential DBPs.


The overall framework of Deep-WET
The construction process of Deep-WET is depicted in Fig. 1.Deep-WET consists of multiple steps, including data preparation, natural language processing (NLP)-based feature encoding, weighted features, optimal feature subset selection, best classifier selection, and final prediction.In the first stage, three NLP-based Word embedding feature encoding techniques were employed (GloVe, Word2Vec, and fastText), and then the optimal subset of features was selected using the SHAP technique from the weighted features.The selected feature subsets from each feature encoding were fed to four ML and one DL algorithms to build the final prediction models using the training and independent test datasets.Finally, the classifier having the highest cross-validation AUC was considered to construct the final predictor herein.

Data preparation
Developing a reliable, comprehensive, and stringent dataset is the first important step of statistical predictor development.Here, the curated dataset denoted with S was presented as: (1) S = S posi ∪ S nega

Feature encodings
Word embedding (WE), in which the vocabulary of words can be represented as vectors using large text as an input, is the most popular technique in the area of natural language processing (NLP).WE techniques are able to convert amino acids in a fixed-length vector, where a user needs to define the fixed feature dimensions that can provide adequate prediction results.In this study, we implemented three unsupervised embedding techniques to encode protein sequences: GloVe 35 , Word2Vec 36 , and fastText [37][38][39] .

Word2Vec
Word2Vec, a model developed by Tomas Mikolav at Google, computes and generates high-quality, distributed, and continuous dense representations of words 36 .These are unsupervised models that can take in massive textual corpora, create a vocabulary of possible word combinations, and generate dense word embeddings on the vector space.The size of the vocabulary determines the size of the word embedding vectors.This decreases the dimensionality of the following dense vector, compared to high-dimensional sparse vector generation using the traditional bag of words (BOW).To construct word embedding, Word2Vec employs two different methods: (1) common bag of words (CBOW) and (2) the Skip-gram model.Notably, the CBOW is faster than the Skip-gram model and generates a better representation of more frequent words 34 .On the other hand, the Skip-gram model performs well with a relatively small amount of data and generates a better representation of rare words 34 .
Finding the target word w t through n predictions using the CBOW model can be accomplished by the fol- lowing equation: Here, w (t−1) to w (t+n) sequence of words represents the context words.The following equation can further simplify the above equation since the hidden layer can be equivalent to a softmax layer: Here, the output weight matrix between hidden layers is denoted as W, and after matrix operation, the average value of input vectors is represented as h t .

GloVe
GloVe is an unsupervised learning vectorization technique.It is a log-bilinear regression model that incorporates both local statistics and global statistics 36 .The training of this model is performed on non-zero entries of global word-to-word co-occurrence statistics that tabulates how frequently words are co-occurring within a given corpus.For collecting statistics, the following matrix needs a single pass through the entire corpus.These passes can be expensive for large corpora.Moreover, its resulting representations show the interesting linear substructures of those word vector spaces.
Here, v i , v j correspond to the word embedding of i, j; X represents the word-to-word co-occurrence matrix; and i th number of co-occurrences of word j is denoted by X ij .Furthermore, the probability of word j occurring in the context i is the following: fastText fastText, proposed by Facebook 38 , is an extension of Word2Vec.It provides tools to learn word representation and sentence classifications of ML.Word vectors are a more organized, numerical, and efficient representation of words and sentences.fastText provides a supervised module to build a model for text classifications.It technique breaks an individual word into a bag of n-grams or sub-words and feeds them into the network, which also generates vector representation for rare or unseen words 37 .Since the technique uses the same architecture as Word2Vec, the following equation minimizes the loss of softmax layer, l over N sequences using CBOW model: Here, x n represents the bag of one-hot encoded vectors and y n represents the label of the nth sequence of words.The purpose of using FastText in the present study is to find the partial information single DNA sequence order.

Weight learning for weighted features
Single-view features represent the discriminative information for each sequence, but combing single-view features to make a weighted feature is critical in ML-based DBP prediction.The most common technique involves serially adding ('+') single features.However, this straightforward combination technique lacks a guarantee to represent discriminative capability and may overlook the relative importance of the base sequence.To address this issue, we employ a differential evolution (DE) method to determine the optimal weights for each feature.DE algorithm variants of evolutionary algorithms and applied in various works 40,41 to show the positive effect.The process we followed for DE algorithm to learn feature weights from a single feature is illustrated as follows: Step 1: I n i t i a l i z a t i o n R a n d o m l y c r e a t e a n i n i t i a l p o p u l a t i o n ) T represents ith number solution in the population gth.N means size of the generation population where to set the maximum generation G max , crossover rate (CR), scaling factor (F) to 1000, 0.5, and 0.5, respectively.
Step 4: Selection Find the better vector from trial TV g i and target FW g i using the following way: Step 5: Termination g = g + 1 and repeat steps 2 to 4 until g is greater the G max .
After concluding the DE procedure, we can get the final results.In this study, We have generated a novel super feature, represented as GloVe + fastText + Word2Vec, by the weighted and sequential fusion of GloVe, fastText, and Word2Vec features.DE is a powerful optimization algorithm; however, using it for feature weighting in ML presents certain limitations and challenges.DE may struggle with slow convergence, susceptibility to local optima, and sensitivity to parameter choices.Additionally, the algorithm may violate constraints, lack robustness across diverse datasets, and exhibit computational intensity.To avoid these challenges, we have performed parameter tuning (population − size, mutation rate, crossover probabilities) in experiments, considering adaptive strategies for mutation and crossover rates.Furthermore, exploring parallelization methods helps alleviate computational burdens, while strategies like diversity maintenance mechanisms aim to address convergence issues.

SHAP-based feature selection scheme
SHAP is an additive feature attribution method introduced by Lunberg and Lee 42 in which each individual prediction is interpreted by the contribution of the features and then ordered according to their importance 43 .SHAP allocates each feature an importance value for a particular prediction.This SHAP feature selection approach is based on game theory 44 ; SHAP values break down a prediction to show the impact of each individual feature.Suppose each feature is x i , is replaced by z i for determining whether the feature value x i exists or not.SHAP represents the explanation as: In the above equation, g represents the explanation model; z ∈ 0, 1 M represents the coalition vector; 0 and 1 indicate that the corresponding feature is absent or present, respectively; the number of input features included in the model is denoted as M; and φ i ∈ R, φ i represents the feature attribution values for a feature i. Considering the game theory concept, Shapley values can be calculated using the following equation: In the above equation, M represents the set of features in the model; all feature subsets achieved from M are represented as S; the function computes the total contribution of a given features set S; S ⊂ M \ {i} repre- sents the value of the corresponding feature when i is known, versus when the corresponding feature value i is unknown for all subsets.
One of the important features of the SHAP is the barplot in the form of rectangular horizontal bars, where the length of the bars represents the importance of a given feature.As we need the global significance, we sum the contribution of each feature, or absolute Shapley values.
Then, we plot each of the features by sorting them in decreasing order.Figure 2A shows the important features based on SHAP contributions for the XGBoost trained before predicting DBPs.The SHAP summary plot gives a high-level composite view that displays the importance of features with feature effects.Each point in the plot represents a SHAP value for a specific feature of an instance.The values that pull the prediction power of the model downwards are on the left, and the values that push the prediction further up are on the right.On the y-axis, the features are placed in descending order, and on the x-axis, there is a scale representing the Shapley value with a vertical line at point zero.The positive and negative values are to the right and left part of that vertical line, respectively.Here the colors separate the relative size of the features between instances.Specifically, low values are colored blue and high values are colored red.Overlapping more data points in the y-axis direction shows the distribution of SHAP values for each individual feature.Moreover, in the summary plot, we clearly observe the relationships between the value of a feature and the effect on the prediction.Figure 2B shows the SHAP summary plot, which orders important features for identifying DBPs.( 8) www.nature.com/scientificreports/

Implementation of convolutional neural network
CNNs are a type of deep learning model commonly used in applications including recommender systems, image and video recognition, and natural language processing 45,46 .In CNN architecture, the deeper convolutional layers (CLs) lead to learning high dimension features using sliding convolution kernels on the upper part of previous layers with different hyper-parameter settings such as filters, control layer outputs, stride, and zero-paddings.
Pooling layers (PLs) are able to reduce the input feature size and offer translation invariance by local non-linear operations 45 .Fully connected layers (FCLs) utilized to classify the tasks, consisting of an equal number of output neurons as artificial neural networks.
Each neuron is completely linked to all of the nodes in the preceding and subsequent levels 47 .After adding one additional CL and max-PL to the process, the technique demonstrated a significant improvement in terms of computational complexity and program runtime.The following equation may be used to compute the outputs of each convolutional layer: The layer index is l, while the input and output feature maps are m and k, respectively.Specifically, y l k denotes the kth feature map of the l layer's input, while y (l−1) m denotes the m− th feature map of layer l − 1 output.The weight tensor and bias term, respectively, are W and b.Back-propagation and adaptive estimating approaches were used to reduce cross-entropy loss 47 .Our model's output layer is essentially a logistic regression classifier.It takes y l k as an input and computes the following: The output ŷ is the final predicted score; W is the weight matrix; b is the bias vector.Each output size is 2, denoting positive or negative classes for the binary classification task of DNA binding predictions.In order to discover suitable parameters, we want to minimize cross-entropy loss by adaptive moment estimation and back-propagation techniques: To improve the model's efficiency, batch normalization and dropout techniques 48 were employed.The dropout in FCLs decreases by a few units during the training phase, whereas batch normalization helps to standardize the inputs into unit standard deviation and zero means.Furthermore, dropout was able to overcome the problem of overfitting, and batch normalization supported the model with sufficient learning ratios.
To achieve a better performance, hyperparameter optimization plays a vital part in the implementation of the proposed methodology.The following hyperparameters are optimized before training the model: learning rate, number of filters, kernel size, batch size, number of hidden layers, optimizers, dropout layers, and activation function.Here, three convolutional layers are used as hidden layers in the CNN model architecture.In addition, 32, 48, 64 filters and kernel sizes of 3, 4, 5 are used.Using ReLu as an activation in the hidden layers and Sigmoid in the fully connected layer results in the desired outcome.Dropout layers with dropout rates of 0.2, 0.3, and 0.5 are used to prevent overfitting.With extensive experimentation, employment of the Adam optimizer with a learning rate of 0.00001 and binary cross-entropy loss function shows the optimal result.Table 1 comprehensively illustrates the hyperparameters used in our method.Detailed parameter settings of the other three classifiers for different feature encoding are also listed in Table 6.

Performance evaluation
The performance of Deep-WET was evaluated in terms of six standard performance metrics for the binary classification problem including accuracy (ACC), sensitivity (Sen),specificity (Spe), Matthew's coefficient correlation (MCC), and precision (Pre).( 13) where TP, FP, TN, and FN respectively represent the number of true positives (correctly classified positive), false positives (incorrectly classified as positive), true negatives (correctly classified negative), and false negatives (incorrectly classified as negative), respectively.Furthermore, the AUC metric was also used to evaluate the performances of the proposed DeepWET model, where the curve is plotted by TPR (sensitivity) and FPR (1 -specificity) with different threshold settings.

Experimental setup and packages
All tests in this study were carried out on three independent computers with the following settings, using Python version 3.7.7 or above: • A desktop computer with Intel Core i5 CPU @ 2.71GHz x 4, Windows 10, 64-bit OS and 8 GB RAM.
CNN classifier and SHAP technique were employed for model learning and feature selection on TensorFlow 2.0 and SHAP 0.39.0 Python libraries to implement them.We utilized improved parameter settings of the CNN algorithm such as batch size 16, kernel size 4, 2 hidden layers, and dropout rate 0.5.Several graphs were plotted in this experiment using Matplotlib 49 , Seaborn 50 , and Plotly 51 , in addition to pre-installed Python tools.

Performance comparison of different feature encodings
In this section, we systematically evaluated the effect of various feature encodings, including single-feature (GloVe, fastText, and Word2Vec) and weighted-feature (GloVe + fastText, GloVe + Word2Vec, fastText + Word-2Vec, and GloVe + fastText + Word2Vec) encodings in DBP identification.These features were inputted to a CNN classifier to evaluate their corresponding models using the 5-fold cross-validation test.The cross-validation performance of variant CNN classifiers trained with different features are provided in Table 2 and Fig. 3A.It is worth noting that the parameters of CNN classifiers were carefully determined to improve their performance under the 5-fold cross-validation process.
Among single-based features, GloVe outperformed fastText and Word2Vec in terms of all performance metrics.The AUC, ACC, Sen, Spe, and MCC of GloVe were 0.810, 75.00%, 71.15%, 77.63% and 0.485, respectively.www.nature.com/scientificreports/Interestingly, AUC, ACC, and MCC of GloVe were 2.5-1.7%,1.56-3.91%,and 2.3-6.5% higher than fastText and Word2Vec, respectively.A weighted feature was created by adding different combinations of the single feature extraction methods in order to improve the predictive performance.As can be seen from Table 2, we observe that the performance the combination of GloVe, fastText and Word2Vec is better than those of other three weighted features in terms of all performance metrics.The ACC, Sen, Spe, and MCC of the combination of GloVe, fastText and Word2Vec are 79.07%64.10%, 91.49% and 0.585, respectively, which are 0.95-3.29%,2.14-4.92%,2.30-11.91%,0.036-0.077%,5.26-12.14%and 0.002-0.027%higher than other combination features, respectively.Figure 3A shows that the AUC value of GloVe+fastText+Word2Vec 0.864, which is larger than the other three weighted features.Overall, we observed that the Sen value of individual features was slightly higher than that of the corresponding weighted features in some cases.Moreover, the performance of the top weighted features (GloVe + fastText + Word2Vec) is significantly higher than the single-view feature in terms of all evaluation metrics.Weighteds features archive higher prediction performances to the single-view feature in terms of all evaluation metrics.Therefore, in this study, the GloVe + fastText + Word2Vec feature outperformed other single and weighted features and is considered as the optimal one in termes of computational cost and predictive performance.

Feature section approaches improve the predictive performance
The original feature subsets extracted from feature encoding techniques might contain noisy and redundant information that can affect the classifiers' performance.Therefore, we utilized feature selection methods to determine important features from the original feature subsets.Here, three feature selection techniques, including RFE 52 , LASSO 53 , and SHAP 42 , were utilized for determining the important features from GloVe + fastText + Word2Vec feature encoding.In our experiment, we ranked all features using its importance obtained from RFE, LASSO, and SHAP and then established the six feature subsets that consisted of the top-ranked features ranging from top 200 to the top 450 features with an interval of 50.Then, for each feature selection technique, the six feature subsets were fed to develop individual CNN classifiers whose corresponding prediction results based on a 5-fold cross-validation were provided in Table 3.
As seen in Table 3, the optimal subsets containing top 300, 400, and 400 optimal features derived from the RFE, LASSO and SHAP techniques, respectively, outperformed other feature sets in terms of both ACC and AUC.In the meanwhile, the performance of the optimal subsets from the SHAP technique outperformed than the RFE and LASSO techniques.To be specific, the AUC, ACC, Sen, Spe, MCC, Pre, and F1 of the optimal subset from the SHAP technique were 0.883, 82.56%, 69.44%, 92.00%, 0.641, 86.21, and 0.769, respectively.Thus, the optimal subset derived from the SHAP technique was considered to develop our proposed model.To check the effectiveness of the optimal subset, we compared its performance with the original feature set.As shown in Tables 2 and 4, the ACC, Sen, MCC and F1 of the optimal subset were 3.49%, 5.34%, 5.60%, and 3.40% higher than the original feature set.For convenience of discussion, the CNN classifier combined with the optimal subset from the SHAP technique is referred herein as Deep-WET.
Altogether, the SHAP technique was a powerful approach for implementing DNA binding protein datasets.To make a clear comparison of prediction effects, the results of the SHAP importance bar graph on the GloVe + fastText + Word2Vec dataset for 400 feature dimensions are shown in Fig. 2A.In Fig. 2A, the bar plot generated by SHAP shows the important features in the form of horizontal bars, with length representing the importance www.nature.com/scientificreports/ of features.We summarized the most significant features by sorting them in decreasing order based on absolute Shapley values.In addition, the SHAP summary plot for 400 feature dimensions is shown in Fig. 2B.It represents a high-level composite look that indicates the important features and effects.Each point depicts a SHAP score in the plot for a particular feature instance.Notably, we can observe the relationship between the feature value and the effect on prediction in the SHAP summary plot.
From the above-mentioned observations and discussion, we concluded that the SHAP technique was a more powerful and effective feature selection one; therefore, this technique was chose for selecting a subset of features for predicting DBPs herein.In addition, we also applied the SHAP technique in other types of features whose corresponding prediction results were summarized in Table 4 and Fig. 3.By comparing the performance of the models without feature selection (Table 2 along with Fig. 3A and C) and with the SHAP-based feature selection (Table 4 along with Fig. 3B and D), the models with the SHAP-based feature selection achieve better performance than those of the models without feature selection.

Hyperparameter of CNN
The hyperparameter learning rate controls how the model changes according to estimated error each time the weights are updated.Finding the optimal learning rate can be challenging, because a higher learning rate makes the gradient drop faster, and a lower learning rate leads to the gradient hardly converging.Here, the faster rate  of gradient drop results in informative and meaningful features failing to get extracted over each iteration, and the lower rate of gradient convergence results in a longer training time.Therefore, five learning rates in a range of 1e-6 to 1e-2 are implemented for the proposed model to find the optimal performance.From Table 5, the learning rate of 0.00001 gives the highest performance compared to implementing the remaining four learning rates.However, the sensitivity value of using the 0.00001 learning rate is suboptimal compared to the value of the 0.0001 and 0.001 learning rates.

Comparison of Deep-WET with conventional ML classifiers
To evaluate the performance of the proposed Deep-WET, we compared its predictive performance with conventional ML classifiers.Herein, the conventional ML classifiers were built using four well-known ML classifiers (i.e., SVM 40 , XGBoost 41 , LightGBM 44 , and CNN 54 ) and the three NLP-based word embedding techniques (i.e., GloVe, fastText, and Word2Vec).In total, 11 conventional ML classifiers were created in this study.It is noteworthy that the parameters of all ML classifiers were carefully optimized to improve their prediction capability under a 5-fold cross-validation procedure.In these experiments, classifiers have been trained a total of 24 times.The prediction performance based on both 5-fold cross-validation and independent tests are listed in Tables 7-8.In addition, their respective graphs are shown in Figs. 4, 5, 6.  www.nature.com/scientificreports/As can be seen from Table 6, Deep-WET achieved the overall best performance compared with the compared ML classifiers in terms of almost performance metrics, with the only exception of the Sen. Meanwhile, CNN-fastText and CNN-Word2Vec were the second-best and third-best classifiers in terms of ACC.To be specific, the ACC values of Deep-WET, CNN-fastText, and CNN-Word2Vec were 82.56%, 80.37%, and 77.91%, respectively.In addition, Deep-WET's AUC, ACC, Spe, and MCC were 3.40%, 2.19%, 2.34% and 3.30%, respectively, higher than the second-best method CNN-fastText.In case of the independent test results, Deep-WET still outperformed the compared ML classifiers in terms of ACC, Spe, MCC, Pre and F1.Deep-WET's ACC, Spe, MCC, Pre, and F1 were 1.37%, 3.84%, 2.60%, 5.13% and 2.10%, respectively, higher than the second-best method CNN-fastText.
To further the comparison, Figs. 4, 5 and 6 illustrate the cross-validation and independent test performance for our proposed Deep-WET approach along with four robust classifiers with all the evaluation metrics.From Tables 7, 8 and Figs. 4, 5 and 6, we can summarize several observations as follows: (i) GloVe features obtained the highest predictive results as compared to fastText and Word2Vec; however, these three feature-encoding techniques all achieved promising performance for the CNN classifiers, followed by LightGBM, XGBoost, and SVM classifier.Word2Vec achieved relatively lower performance, whereas fastText was slightly better than Word2Vec, (ii) CNN classifier consistently achieved the highest results compared to the other three classifiers for all three feature-encoding techniques, and (iii) Finally, our proposed Deep-WET achieved better performance than other conventional ML classifiers, highlighting its superior discriminative power.

Comparison of Deep-WET with the state-of-the-art methods
To further validate the discriminative power of Deep-WET method, we compared its prediction performance against other existing DBP methods, including DPP-PseAAC 10 , PseDNA-Pro 19 , iDNA-Prot 18 , iDNA-Prot|dis 14 , PSFM-DBT 22 , Local-DPP 11 , HMMBinder 23 , iDNAProt-ES 7 , IKP-DBPPred 24 Xiuquan et al. 55 , iDRBP-MMC 56 and TargetDBP 25 , on the independent test data.The prediction performance of the existing methods were obtained by submitting protein sequences in the independent test dataset (148 DBPs and 148 non-DBPs) to their own webservers.Since the web sever of iDNAProt-ES was not functional, the prediction results of iDNAProt-ES were obtained from the reimplementation of iDNAProt-ES and the standalone version of HMMBinder 23 , respectively.Table 8 shows the prediction results of Deep-WET and other existing methods.
According to the F1 and MCC values, these two evaluation metrics of binary predictions, recorded in Table 8, we can see that Deep-WET has superior performance over other exiting methods in terms of ACC, MCC, Pre, and F1.Notably, by comparing the proposed Deep-WET approach with the second-best predictor TargetDBP in terms of ACC, we observe that Deep-WET achieved improvements of 1.39%, 1.70%, 2.50%, 5.18%, and 3.40% on

Ablation study
Our CNN model has key components such as convolutional filters, pooling strategies, kernel sizes and fully connected layers, etc.Here, we have conducted ablation studies using the GloVe + fastText + Word2Vec dataset under 5-fold CV, assessing how each individual component influences the predictive performance of Deep-WET: • Remove Specific Convolutional Filters (RSCF): we removed specific filters in the convolutional layers respon- sible for capturing sequence motifs or patterns linked to DNA binding.Figure 7 show the performance comparison of Deep-WET and its four variants in terms of AUC on GloVe + fastText + Word2Vec dataset.We can observe that our CNN has better performance than CNN-RSCF, CNN-VPS, CNN-VKS and CNN-RFCL on experiment datasets Here, our-CNN obtains the best AUC score of 0.883, and it is 0.085%, 0.135%, 0.105% and 0.165% higher than that of CNN-RSCF, CNN-VPS, CNN-VKS and CNN-RFCL, respectively, which can illustrate that these parts in our design can improve the predictive performance.Among them, CNN-VPS and CNN-RFCL have the lowest performance.This shows that it is very important to perform the hyperparameters setting of CNN classifiers (see Table 1), that can effectively improve the performance of CNN.After clicking the submit button, the server will evaluate the protein sequence and check the format for processing.Prediction results will be generated in a tabular format with detailed information on the word serial number and predicted probability of DBPs and predicted class (DBPs/non-DBPs).Detailed instructions for the webserver can be found on the README option.After the final job, users will get a job ID to be used for further queries.The Deep-WET web server application stores this job ID for fifteen days.Deep-WET may have a long computational time when users input large protein sequences files, since Deep-WET needs to perform NLP-based word embedding packages to generate discriminative features and fix the suitable parameters for the CNN classifier to predict.We strongly suggest   Cellular death is a fundamental and complex biological process that is an underlying driver for many diseases.Authors in 65,66 , worked for cell death.Our CNN model can be used to classify cells undergoing cell death.This deep learning network has the ability to highly predict cell death.Finally, it is possible to provide a simple Python tool that can be broadly used to detect cell death.Furthermore, our CNN model can recommend specific drugs for the disease.

Figure 1 .
Figure 1.The flowchart illustrates our proposed methodology.The upper part represents data pre-processing, the middle part depicts feature extraction with various classifiers, and the lower part showcases classification using the CNN model. https://doi.org/10.1038/s41598-024-52653-9

Figure 2 .
Figure 2. The SHAP importance bar graph results for the GloVe + fastText + Word2Vec dataset with 400 feature dimensions are presented.(A) bar plot generated by SHAP shows the important features in the form of horizontal bars (B) SHAP summary plot for the 400 feature dimensions.

Figure 3 .
Figure 3. ROC curves and AUPR curves of CNN classifiers are depicted for both single and hybrid feature spaces without feature selection (A, C) and with SHAP-based feature selection (B, D).

•
Variation in Pooling Strategies (VPS): adjust the baseline model by altering pooling strategies to evaluate how these changes affect the recognition of relevant sequence features.• Variation in Kernel Sizes (VKS): Explore diverse kernel sizes within the convolutional layers to capture sequence motifs associated with DNA binding of different lengths.• Removal of Fully Connected Layers (RFCL): create a modified version of the baseline model by removing one or more fully connected layers to examine the significance of global features in the classification task.

Figure 6 .
Figure 6.Performance comparison of various machine learning classifiers trained with three feature groups, utilizing different classifier and feature representations, is presented in terms of AUC and MCC evaluation metrics under 5-fold cross-validation (A, B) and independent testing (C, D).
probabilistic local outlier factor (pLOF) with the extracted features to tackle the challenge of label noise in the dataset, ensuring a trustworthy application; (3) developing a graph-based deep learning model for predicting DBPs with unknown structures.

Figure 7 .
Figure 7. Comparative analysis between Our CNN and its ablation experiments on the GloVe + fastText + Word2Vec dataset.

Table 2 .
Performance comparison of CNN classifiers trained with different feature encodings on the training dataset.

Table 3 .
Performance comparison of various feature sets derived from different feature selection techniques.Significant values are in bold.

Table 4 .
Performance comparison of CNN classifiers trained with different optimal feature sets on the training dataset.

Table 5 .
Cross-validation results of CNN classifiers trained with different learning rates.Significant values are in bold.

Table 6 .
Cross-validation results of different ML classifiers and feature encoding schemes.Significant values are in bold.

Table 8 .
Performance comparisons of DeepWET with the state-of-the-art methods on the independent test dataset.Significant values are in bold.a The prediction performance of the existing methods were obtained by submitting protein sequences in the independent test dataset (148 DBPs and 148 non-DBPs) to their own webservers.